Uso de instâncias de dados e carga de trabalho para mineração de restrições de integridade
نویسندگان
چکیده
Functional dependencies (FDs) are integrity constraints widely studied in the context of data profiling. In this work, we explore the automatic discovery of FDs and describe a method for selecting relevant ones regarding workload semantics. The experimental evaluation shows that the selected dependencies exhibit expressive properties compared to the search space, which demonstrates the effectiveness of the presented approach.
منابع مشابه
Impacto da amostragem aleatória uniforme para o aumento da escalabilidade na geração de agrupamentos hierárquicos de séries espaço-temporais
This paper presents the results of a scalable approach to build hierarchical clustering from space-time series. The goal is to reduce the complexity in terms of space and time. The approach explores data sampling pre-processing techniques to reduce the numerosity of the data. The experiment indicates it is needed the development of more efficient strategies than the naive selection of samples (...
متن کاملUma Estratégia para Seleção de Atributos Relevantes no Processo de Resolução de Entidades
Data integration is an essential task for achieving a unified view of data stored in heterogeneous and distributed sources. A key step in this process is the Entity Resolution, which consists of identifying instances that refer to the same real-world entity. Functions that evaluate the similarity between values of attributes are used to identify equivalent instances. This work proposes a strate...
متن کاملUma Avaliação de Eficiência e Eficácia da Combinação de Técnicas para Deduplicação de Dados
Data Deduplication is the task of identifying and eliminating duplicate records in a single database. It is a complex process that involves several steps, including: defining blocking key, similarity function and indexing method. There are several approaches for each of these steps. In this context, the objective of this work is to find the best combination for such algorithms aiming to improve...
متن کاملModelagem de Bancos de Dados em Tempo-real
Neste trabalho introduzimos um método para a modelagem de Banco de Dados em Tempo-real (BDTR) utilizando uma notação de redes de Petri baseadas em objetos denominada EG-CPN. Esta notação é enriquecida de modo a promover a descrição eficiente de modelos integrando BDTR e Sistemas em Tempo-real (STR). O método disponibiliza para o projetista construções que permitem, por exemplo, declarar restriç...
متن کاملWorkload and associated factors: a study in maritime port in Brazil 1
Objective: to identify the effect of the mental, physical, temporal, performance, total effort and frustration demands in the overall workload, and in the same way analyze the global burden of port labor and associated factors that contribute most to their decrease or increase. Method: a cross-sectional, quantitative study, developed with 232 dock workers. For data collection, a structured qu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017